skip to main content


Search for: All records

Creators/Authors contains: "Huang, Biao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. To determine the error rate of transcription in human cells, we analyzed the transcriptome of H1 human embryonic stem cells with a circle-sequencing approach that allows for high-fidelity sequencing of the transcriptome. These experiments identified approximately 100,000 errors distributed over every major RNA species in human cells. Our results indicate that different RNA species display different error rates, suggesting that human cells prioritize the fidelity of some RNAs over others. Cross-referencing the errors that we detected with various genetic and epigenetic features of the human genome revealed that the in vivo error rate in human cells changes along the length of a transcript and is further modified by genetic context, repetitive elements, epigenetic markers, and the speed of transcription. Our experiments further suggest that BRCA1, a DNA repair protein implicated in breast cancer, has a previously unknown role in the suppression of transcription errors. Finally, we analyzed the distribution of transcription errors in multiple tissues of a new mouse model and found that they occur preferentially in neurons, compared to other cell types. These observations lend additional weight to the idea that transcription errors play a key role in the progression of various neurological disorders, including Alzheimer’s disease. 
    more » « less
  2. Surfacing and mitigating bias in ML pipelines is a complex topic, with a dire need to provide system-level support to data scientists. Humans should be empowered to debug these pipelines, in order to control for bias and to improve data quality and representativeness. We propose fairDAGs, an open-source library that extracts directed acyclic graph (DAG) representations of the data flow in preprocessing pipelines for ML. The library subsequently instruments the pipelines with tracing and visualization code to capture changes in data distributions and identify distortions with respect to protected group membership as the data travels through the pipeline. We illustrate the utility of fairDAGs, with experiments on publicly available ML pipelines. 
    more » « less